{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Accessing Remote Resources\n", "==========================\n", "\n", "Web pages and data\n", "------------------\n", "\n", "I have mentioned before how one can access data files on your hard drive, but Python also allows you to access remote data, for example on the internet. The easiest way to do this is to use the [requests](https://pypi.python.org/pypi/requests) module. To start off, you just can get the URL:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import requests\n", "response = requests.get('http://xkcd.com/353/')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "``response`` holds the response now. You can access the content as text via the text-property:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(response.text[:300]) # only print the first 300 characters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can either just use this information directly, or in some cases you might want to write it to a file. Let's download just the image from the comic above:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "r2 = requests.get('https://imgs.xkcd.com/comics/python.png')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "r2.headers # You can see that 'Content-Type' is 'image/png'" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "r2.text[:100] # The first few lines" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, this doesn't seem to be actual text. Instead, its a binary format. The binary data of the response can be accessed via" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "r2.content[:100]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note the `\\x89PNG` at the beginning indicating a PNG-type binary byte-string. Most binary data start with a string describing the format of the file that is at least partially human-readable." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can open a new (binary) file and download the data to the file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "with open('downloaded_image.png', 'wb') as f:\n", " f.write(r2.content)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's now load and display the image. One way is to use `matplotlib`'s `image` method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "import matplotlib.image as mpimg\n", "img = mpimg.imread('downloaded_image.png')\n", "fig1 = plt.figure(figsize=(18, 16), dpi= 80, facecolor='w', edgecolor='k')\n", "plt.imshow(img, cmap='gist_gray')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another option is to use the Python Imaging Library (PIL) module which offers several standard procedures for image processing (e.g. blurring, sharpening, resizing):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from PIL import Image\n", "img2 = Image.open('downloaded_image.png')\n", "fig2 = plt.figure(figsize=(18, 16), dpi= 80, facecolor='w', edgecolor='k')\n", "plt.imshow(img2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "APIs\n", "----\n", "\n", "Imagine that you want to access some data online. \n", "A number of websites now offer an \"Application programming interface\" (or API) which is basically a way of accessing data is a machine-readable way. \n", "An example for weather data is http://openweathermap.org/API \n", "\n", "For the access, we often need an access key. This is usually generated for you, e.g. if you want to access cloud services of one of the well-known providers. The following example tells you how that can be added for this particular API." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will use the API provided by Openweathermap.org to get the current weather data for Heidelberg.\n", "Instructions on how to use this API are provided [here](https://openweathermap.org/current)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We will import the weather report for Heidelberg (latitude = 49.407681, longitude = 8.69079 decimal degrees)\n", "\n", "import requests\n", "#r1=requests.get('http://samples.openweathermap.org/data/2.5/weather?lat=51.51,lon=-0.13&APPID=329bded0f436c203622bd75ca56dc93f')\n", "r1 = requests.get('http://api.openweathermap.org/data/2.5/weather?lat=49.408&lon=8.691&APPID=329bded0f436c203622bd75ca56dc93f')\n", "print (r1.json())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or we can also search by city name. Let's try London." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#r2 = requests.get('http://samples.openweathermap.org/data/2.5/weather?q=London,UK&APPID=329bded0f436c203622bd75ca56dc93f')\n", "r2 = requests.get('http://api.openweathermap.org/data/2.5/weather?q=London,UK&APPID=329bded0f436c203622bd75ca56dc93f')\n", "print (r2.json())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another example of an organization that provides access to its (cloud based) archives via API is the Las Cumbres Observatory (LCO) which provides access to millions of astronomical images. Querying and downloading files requires python scripts if one wants to automatically search the archive. Scientific users are then provided with instructions on how to use the interface:\n", "\n", "https://developers.lco.global/#data-format-definition" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Saving and restoring data efficiently\n", "----------------------------------------------\n", "\n", "As we have seen before, there are multiple ways of opening and processing data. You can, of course, always resort to writing data line by line to disk. In practice, there are multiple alternatives for writing python data to disk and some of them are actually more efficient than others.\n", "\n", "First of all, when you are working with numpy arrays and structures you might want to consider using built-in function such as np.savetext\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# example from https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html\n", "import numpy as np\n", "x = y = z = np.arange(0.0,5.0,1.0)\n", "np.savetxt('test.out', x, delimiter=',') # X is an array\n", "np.savetxt('test.out', (x,y,z)) # x,y,z equal sized 1D arrays" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sometimes one wants to restore the exact, current state of a numpy array without actually writing all human-readable digits to disk. In order to achieve that, numpy comes with a dedicated numpy.save method. It permits to store the actual bits efficiently to disk without having to waste characters for doing that. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.arange(10)\n", "with open('test.npy','wb') as fp:\n", " np.save(fp,x)\n", "print(x)\n", "\n", "with open('test.npy','rb') as fp2:\n", " y = np.load(fp2)\n", "print(y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 1 }